Artificial Intelligence in the Life Sciences
○ Elsevier BV
Preprints posted in the last 90 days, ranked by how well they match Artificial Intelligence in the Life Sciences's content profile, based on 11 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Roberts, K. F.; Abrams, Z. B.; Cappelletti, L.; Moqri, M.; Heugel, N.; Caufield, J. H.; Bourdenx, M.; Li, Y.; Banerjee, J.; Foschini, L.; Galeano, D.; Harris, N. L.; Li, M.; Ying, K.; Melendez, J. A.; Barthelemy, N. R.; Bollinger, J. G.; He, Y.; Ovod, V.; Benzinger, T. L. S.; Flores, S.; Gordon, B.; Ojewole, A. A.; Phatak, M.; Elbert, D. L.; Biber, S.; Landsness, E. C.; Mungall, C. J.; Bateman, R. J.; Reese, J.
Show abstract
BackgroundAdvances in medicine depend on analyzing large and complex data sources, but discovery is partly constrained by the limited time and domain expertise of human researchers. Agentic artificial intelligence (agentic AI) can accelerate discovery by automating components of the scientific workflow, including information retrieval, data analysis, and knowledge synthesis. AimOpenScientist, an open-source agentic AI co-scientist, aims to accelerate biomedical discovery by semi-autonomously investigating scientist-defined queries and generating clinically relevant, verifiable scientific insights. MethodsDomain experts evaluated OpenScientist for novel discoveries in four clinical case studies: (1) a prespecified analysis in a community-based Alzheimers disease biomarker cohort, (2) unsupervised modeling for plasma proteomic survival prediction, (3) hypothesis investigation in single-cell transcriptomic data from neurons with neurofibrillary tangles, and (4) hypothesis generation with validation in a multiple myeloma dataset with a randomized negative control. ResultsOpenScientist completed analyses in minutes that otherwise would take weeks to months of human time and expertise. It identified %ptau217 as the best predictor of amyloid PET status, generated a plasma proteomic survival model with performance comparable to published models, proposed a mechanism linking tau pathology to altered lysosomal acidification, and generated multiple myeloma hypotheses that were validated in an external cohort while distinguishing true signal from randomized controls. ConclusionOpenScientist demonstrates that open, auditable, agentic AI can support real-world clinical research by generating hypotheses, executing analyses, and discovering insights from complex datasets.
Singh, P.; Rath, S. L.
Show abstract
Background: Alzheimers disease (AD) is a multifactorial neurodegenerative disorder in which copper dyshomeostasis, mitochondrial stress, oxidative injury and immune dysregulation may contribute to pathogenesis. Cuproptosis, a copper-triggered regulated cell death pathway, has emerged as a potential mechanistic link to AD, but its therapeutic and biomarker implications remain incompletely defined. Methods: We integrated transcriptomic, machine learning, immune infiltration, QSFR, molecular docking, docking validation and ADME analyses using GEO blood- and brain-based AD cohorts. Differentially expressed genes were intersected with curated cuproptosis-related genes, followed by pathway enrichment, construction and validation of a hybrid ensemble classifier, CIBERSORT-based immune correlation analysis, QSFR-driven target prioritization, ligand docking, consensus docking validation and SwissADME profiling. Results: The transcriptomic analyses revealed reproducible AD associated signatures enriched in neurodegenerative, oxidative stress, mitochondrial and inflammatory pathways. Across multiple machine learning models, FDX1, PDHB, PDHA1, DLAT and DLD consistently emerged as the most important cuproptosis-related genes, with the hybrid ensemble achieving the best diagnostic performance. Immune profiling suggested that these genes are linked to distinct immune infiltration patterns. QSFR and docking prioritized FDX1 as a key target and Clioquinol, PBT2 and Ebselen showed the strongest and most consistent binding behavior. Docking validation confirmed reliable pose reproduction and enrichment over decoys, while ADME analysis supported Clioquinol, PBT2 and Ebselen as the most balanced candidates for further consideration. Conclusion: This integrated workflow identifies a cuproptosis-centered mitochondrial gene module as a robust AD signature and highlights Clioquinol, PBT2 and Ebselen as promising repurposing candidates. The findings provide a prioritized computational framework for future experimental validation of copper-linked therapeutic strategies in AD.
Bai, J.; Prince, S.; Nitschke, G. S.
Show abstract
Recent deep learning models for L1000 chemical perturbation prediction incorporate dedicated drug molecular encoders. We retrained seven such models from scratch with zeroed or shuffled drug inputs, and compared them with a multilayer perceptron that uses only cell-line basal expression. Under drug-blind evaluation, ablation caused negligible performance changes and the drug-free baseline matched all models. Current architectures do not yet utilise drug molecular features for generalisation to unseen compounds.
Guo, J.
Show abstract
The rapid growth of molecular foundation models and large language models has encouraged a scale centred view of AI in drug discovery, in which larger pretrained models are expected to supersede compact cheminformatics models and graph neural networks (GNNs) trained for individual tasks. We test this assumption across 26 endpoints for molecular properties, toxicity, safety liabilities and biological activity, grouped into ADME, toxicity and bioactivity classes. The benchmark contains 78 endpoint and split entries spanning random, Murcko scaffold and structure separated 5-fold CV. Ordered from easiest to hardest, these splits approximate retrospective evaluation on a closed library, scaffold expansion in hit to lead, and library expansion on novel chemotypes. Each entry includes ML, GNN, pretrained molecular sequence and LLM based SAR families. Across 156 fold mean comparisons, classical ML such as RF(ECFP4) and ExtraTrees(RDKit) win 116, GNNs such as GIN and Ligandformer win 25, pretrained sequence models such as MoLFormer and ChemBERTa2 win 12, and LLM based SAR baselines win three. ML dominates random split interpolation but loses part of this advantage under harder splits; GNN and sequence models also decline but gain relative ground, whereas LLM based SAR is weaker in absolute terms yet less sensitive to the split axis. Paired bootstrap analyses support family level trends more strongly than individual model rankings. SAR knowledge derived from training folds improves many GPT5.5-SAR and Opus4.7-SAR metrics but does not make rule based reasoning a universal substitute for supervised predictors. Compact specialized models remain highly effective for molecular property and activity prediction. Larger models add value for SAR interpretation and reasoning in low data settings, but predictive performance depends on the fit among model, task and validation scenario, not on scale alone.
Jovanovic, M.; Weidener, L. S.; Brkic, M.; Ulgac, E.; Meduri, A.
Show abstract
Drug-induced inhibition of the hERG potassium channel is the leading cause of cardiac safety-related drug attrition, but the Comprehensive in Vitro Proarrhythmia Assay (CiPA) framework requires activity data on multiple cardiac ion channels to assess proarrhythmic risk. We present CardioSafe, a three-branch multi-task neural network with cross-attention fusion that integrates chemical fingerprints, ChemBERTa embeddings, and predicted L1000 transcriptomic features to predict blocker status and potency for hERG, Nav1.5, and Cav1.2, with an exploratory IKs head. CardioSafe was trained on the largest publicly reported multi-channel cardiac ion channel dataset, combining ChEMBL 36 with the hERGCentral database (331127 hERG, 3160 Nav1.5, 1138 Cav1.2, and 115 IKs compounds), curated under a pharmacology-aware policy that retains censored measurements and inhibition-percentage votes. Under Tanimoto-similarity-controlled splits, CardioSafe outperforms the leading published comparators (CToxPred2 and CardioGenAI) on the data-rich hERG head; on the smaller Nav1.5 and Cav1.2 heads the standard evaluation is statistically inconclusive. A reverse-leak audit revealed that 22% of Nav1.5 and 21% of Cav1.2 test compounds were present in published comparators training data (92% as exact compound matches); after removing these contaminated compounds, CardioSafes lead on Nav1.5 and Cav1.2 also reaches statistical significance, demonstrating that prior cross-publication benchmarks for these channels were inflated by training-data overlap. Scientific contributionWe present the first multi-task neural network jointly predicting blocker activity for the three primary CiPA cardiac ion channels (hERG, Nav1.5, Cav1.2) within a single architecture. We introduce a reverse-leak audit methodology that reveals systematic test-set contamination in cross-publication cardiac safety benchmarks, establishing a stricter evaluation protocol. We provide the empirical test of predicted L1000 transcriptomic features as auxiliary input for cardiac ion channel prediction and document a well-characterized negative result. Graphical abstractCardioSafe encodes each query SMILES with three branches (chemical fingerprints + descriptors, pretrained ChemBERTa, and predicted L1000 transcriptomic signatures), fuses them via a cross-attention block with four learnable per-channel query tokens, and emits binary blocker calls plus pChEMBL regression for hERG, Nav1.5, Cav1.2, and (exploratory) IKs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=59 SRC="FIGDIR/small/723181v1_ufig1.gif" ALT="Figure 1"> View larger version (13K): org.highwire.dtl.DTLVardef@1c0ba2aorg.highwire.dtl.DTLVardef@1fe3a0borg.highwire.dtl.DTLVardef@194de8aorg.highwire.dtl.DTLVardef@9e4f74_HPS_FORMAT_FIGEXP M_FIG C_FIG
Acitores Cortina, J. M.; Schut, M. C.; Tatonetti, N. P.
Show abstract
Drug-induced arrhythmias, particularly Torsades de Pointes (TdP), pose a significant risk to patient safety and can sometimes have life-threatening outcomes. They remain a major concern in drug development and regulation. Machine learning (ML) has become a powerful tool for analyzing complex biological and chemical datasets, enabling researchers to identify subtle patterns that differentiate safe compounds from those likely to cause dangerous cardiac effects. However, most existing in silico approaches do not sufficiently incorporate biological elements, relying heavily on chemical and structural properties or on computationally expensive simulations. Here, we introduce BioMADE, a novel ML framework that harnesses small-molecule-protein activity profiles from publicly available datasets to predict TdP risk without requiring exhaustive mechanistic annotation. Activity data from ChEMBL were used to train individual models for each gene, which predict activity values for any given compound. A curated set of arrhythmia-relevant genes was then used to construct a latent biological embedding (BioMADE embedding) for each molecule. We validated the performance of these features in distinguishing biological elements such as ATC3 class, showing superior classification performance compared with representations such as Molformer (lacks biological information) and MACCS (limited chemical properties) (0.85 AUROC vs 0.81 and 0.73, respectively). BioMADE representations served as input to a support vector machine classifier to discriminate TdP-inducing drugs from safe compounds. BioMADE achieved an AUROC of 0.89 in internal validation, indicating strong predictive performance. Against state-of-the-art models such as ADMEThyst, BioMADE achieved an AUROC of 0.74 on ADMEThysts validation set (vs. 0.72 for ADMEThyst). When we combined both approaches, the AUROC reached 0.77. These results demonstrate that BioMADE provides a scalable, biology-informed, and generalizable approach for predicting drug-induced toxicities. By integrating protein activity profiles into toxicology modeling, our framework highlights the critical role of human biology in adverse drug reaction prediction, an aspect often overshadowed by purely chemical or structural descriptors.
Pinero, S. L.; Li, X.; Lee, S. H.; Liu, L.; Li, J.; Le, T. D.
Show abstract
Long COVID affects millions of people worldwide, yet no disease-modifying treatment has been approved, and existing interventions have shown only modest and inconsistent benefits. A key reason for this limited progress is that current computational drug repurposing pipelines do not match well with the clinical reality of Long COVID. These patients often have persistent, multi-systemic symptoms and may already be taking multiple medications, making treatment safety a primary concern. However, most repurposing workflows still treat safety as a downstream filter and rely on disease-associated targets rather than causal drivers. They also assume that the findings of one analysis would generalize across the diverse presentations of Long COVID. We introduce SPLIT, a safety-first repurposing framework that addresses these limitations. SPLIT prioritizes safety at the start of the candidate evaluation, integrates complementary causal inference strategies to identify likely driver genes, and uses a counterfactual substitution design to compare drugs within specific cohort contexts. When applied to cognitive and respiratory Long COVID cohorts, SPLIT revealed three main findings. First, drugs with similar predicted efficacy could have very different predicted safety profiles. Second, the drugs flagged as unfavorable were often different between the two cohorts, showing that drug prioritization is phenotype-specific. Third, SPLIT flagged 18 drugs currently under active investigation in Long COVID trials as having unfavorable predicted profiles. SPLIT provides a practical framework to identify safer, more context-appropriate candidates earlier in the process, supporting more targeted and better-tolerated treatment strategies for Long COVID.
Lalagkas, P. N.; Melamed, R. D.
Show abstract
Most clinical trials fail due to either lack of efficacy or safety concerns. Human genetics can address both failure reasons: disease-associated genes are not only promising therapeutic targets but also predict drug side effects. However, because the same genetic signal underlies both outcomes, we need methods that disentangle which disease genes mediate therapeutic benefit versus adverse side effects. We use DraphNet, our previously developed model that maps drug molecular effects onto disease genes to generate two gene sets per drug: one linked to its therapeutic effects (IND genes) and one linked to its side effects (SE genes). We show that IND and SE genes overlap for 76% of the tested drugs (compared to a null model). We also show that drugs sharing greater IND similarity also have greater SE similarity ({rho}=0.57, p<1e-300). To show how our approach enables insights into drug biology, we construct groupings of drugs based on their IND and SE genes. We find that drugs in the same IND grouping are enriched for co-occurrence in the same SE grouping (OR=212.37). We present two examples to illustrate the kind of insights this network enables: identification of drugs with shared IND but distinct SE genes as repurposing candidates, and identification of drugs with shared SE but distinct IND genes to assist treatment selection in patients with comorbidities. Finally, we develop a neural network that directly links drug molecular effects onto disease genes and learns a gene-level score that quantifies each genes relative contribution to drug therapeutic versus side effects on disease.
Abbott, J. M.
Show abstract
Machine learning models for protein-ligand bioactivity prediction are increasingly used in computational drug discovery. However, reported benchmark performance is often sensitive to evaluation design. To further understand evaluation design strategies, we present a systematic evaluation of seven machine learning architectures for kinase inhibitor bioactivity prediction, spanning classical baselines (Random Forest, XGBoost, ElasticNet, multi-layer perceptron) and advanced neural approaches (Graph Isomorphism Network, ESM-2 protein embedding MLP, and a GNN-ESM fusion model). Using a curated ChEMBL-derived kinase activity dataset of 352,874 records across 507 human protein kinase targets, we evaluated all models under three splitting strategies of increasing stringency: random, scaffold-based (Bemis-Murcko), and target-held-out. We observed that Random Forest with Morgan fingerprints achieves near-equivalent or superior performance to all neural architectures under scaffold and target-based evaluation. On target-held-out splits frozen ESM-2 embeddings showed worse generalization, with ESM-FP MLP exhibiting the largest performance degradation. Learned graph representations (GIN) do not outperform fixed 2048-bit ECFP4 fingerprints at this data scale, and tree-based uncertainty methods outperform MC-Dropout implementations tested here on calibration and selective prediction metrics. A JAK kinase subfamily case study shows that protein-aware models achieved 79% top-1 selectivity accuracy versus 52% for pooled fingerprint models. However, stronger baselines using explicit target identity achieved 83-84%, indicating that ESM-2 embeddings in this study functioned primarily as an implicit target identifier. These results indicate that evaluation methodology and statistical rigor are major determinants of reported performance in bioactivity prediction. Benchmark design overview O_FIG O_LINKSMALLFIG WIDTH=177 HEIGHT=200 SRC="FIGDIR/small/719590v1_ufig1.gif" ALT="Figure 1"> View larger version (50K): org.highwire.dtl.DTLVardef@18b6fc8org.highwire.dtl.DTLVardef@157db3dorg.highwire.dtl.DTLVardef@fac215org.highwire.dtl.DTLVardef@dbfa6f_HPS_FORMAT_FIGEXP M_FIG C_FIG A curated ChEMBL kinase bioactivity dataset (352,874 records, 507 targets) was evaluated under three splitting strategies of increasing stringency. Seven model architectures spanning baselines, protein-aware, and graph neural approaches were each trained under 5-seed replication (105 total runs), with results analyzed across three complementary branches: the main 507-target benchmark, ESM-2 embedding ablation studies on a clean 92-target subset, and a JAK-family selectivity case study with stronger target-conditioned baselines
Liu, T.; Jiang, S.; Zhang, F.; Sun, K.; Head-Gordon, T.; Zhao, H.
Show abstract
Large language models (LLMs) are in the ascendancy for research in drug discovery, offering unprecedented opportunities to reshape drug research by accelerating hypothesis generation, optimizing candidate prioritization, and enabling more scalable and cost-effective drug discovery pipelines. However there is currently a lack of objective assessments of LLM performance to ascertain their advantages and limitations over traditional drug discovery platforms. To tackle this emergent problem, we have developed DrugPlayGround, a framework to evaluate and benchmark LLM performance for generating meaningful text-based descriptions of physiochemical drug characteristics, drug synergism, drug-protein interactions, and the physiological response to perturbations introduced by drug molecules. Moreover, DrugPlayGround is designed to work with domain experts to provide detailed explanations for justifying the predictions of LLMs, thereby testing LLMs for chemical and biological reasoning capabilities to push their greater use at the frontier of drug discovery at all of its stages.
Potter, H. G.
Show abstract
Generative artificial intelligence (genAI) tools are increasingly used by prospective higher education (HE) applicants seeking guidance on university and programme selection. Despite rapidly expanding use, little is known about how genAI systems may introduce or amplify bias in undergraduate admissions decision-making. Here, we systematically examined patterns of bias across three widely used genAI chatbots (ChatGPT, Copilot, Gemini) using neuroscience as a representative UK undergraduate programme. We constructed 216 prompts that varied by applicant characteristics (e.g. gender, study type, academic attainment). Each prompt was submitted to all three chatbots, generating 648 responses and 3240 individual programme recommendations. Output responses underwent text analysis (e.g. n-grams, gender-coded language), and national HE markers of esteem (REF21, TEF23, NSS24) were analysed. Applicant grades and priorities produced the strongest effects on genAI outputs. Higher-grade applicants and those prioritising research received significantly more masculine-coded language, independent of applicant gender. N-gram patterns also diverged: high-grade prompts more frequently elicited terms relating to excellence and research intensity, whereas lower-grade prompts produced greater emphasis on widening access. Recommendations were systematically skewed, with higher grades, private schooling, and research-focused priorities increasing the likelihood of recommending elite institutions and programmes with higher entry requirements. Critically, the gender-coded language of outputs predicted institutional characteristics: masculine-coded responses were associated with recommendations featuring higher entry thresholds and stronger research performance, while feminine-coded responses favoured institutions with higher student satisfaction. These findings reveal clear, systematic biases in how genAI guides prospective HE applicants. Such biases risk reinforcing existing educational and socioeconomic inequalities, underscoring the need for transparency, regulation, and oversight in the use of genAI within HE decision-making. HighlightsO_LIGenAI is widely used by HE applicants despite little study of its biases. C_LIO_LI216 prompts across 3 chatbots generated 3240 programme suggestions. C_LIO_LIGrades and priorities drove major shifts in language and recommendations. C_LIO_LIGender-coded wording mapped onto research strength and entry standards. C_LIO_LIGenAI biases may reinforce inequalities in HE admissions decision-making. C_LI
Dohi, E.
Show abstract
We screened a 5 receptor x 7 aptamer = 35-cell cross-target matrix with HADDOCK3 [1] under blind ambiguous-interaction-restraint (AIR) protocols on AlphaFold-modelled receptors. The screen surfaced 12 operationally distinct failure modes (collapsing to [~]8 conceptual classes; [§]3.1). The K_D-calibration subset is n = 4 cells with literature K_D records under matched assay conditions; the broader cohort includes [≥] 6 biological cognate or intended-cognate cells. The principal case study is P01031 (complement C5, 1676 aa, [≥] 12 structural domains): all 7 panel members produced positive HADDOCK3 top-1 scores under a scale-adaptive AIR. Score-term decomposition locates the anomaly in the AIR term (+217 to +268 to top-1 score). With AIR zeroed, scores fall to -131 to -74 -- the small-receptor regime. Boltz-2 cofolding chain-pair ipTM (cpi_AB) is an independent channel: P01031 shows the lowest median cpi_AB (0.211; 0/7 above the 0.5 confident-interface threshold). To our knowledge, this is the first reported case study of a 1676 aa multi-domain receptor exhibiting this signature under blind scale-adaptive AIR -- an n = 1 mechanistic case, not a statistical generalisation. We adapt the QSAR applicability domain concept [14-16] to in silico aptamer screening. [§]3.7 reports an empirical Mode 1 mitigation (pLDDT-aware AIR prefilter; cohort Jaccard recovery [~]10x).
Ke, J.; Melamed, R. D.
Show abstract
Understanding which disease genes are altered by a drug can provide insight into the biology of effect, help us understand adverse drug effects, and suggest new drug uses. Here, we build on our model Draphnet in a new formulation with a similar goal. Draphnet was designed to explain drug therapeutic and side effects by learning a network connecting drugs to the disease genes they alter. Our new model, DraPhormer, has a similar goal but instead of relying on a linear model, learning of drug to gene connections uses a transformer model. DraPhormer integrates drug molecular data, disease genetics, and known drug effects on diseases, along with language models representing all of these entities. We show in simulations that DraPhormer can explain the genetic mechanisms of drug effects. Then, we present our design for incorporating drug and disease biology into the model. Finally, we benchmark the models ability to learn drug indications and side effects in real data.
Chaidos, N.; Dimitriou, A.; Calzi, H.; Casiraghi, E.; Stamou, G.; Valentini, G.
Show abstract
Counterfactual Explanation (CE) algorithms have been successfully applied to uncover the main factors driving computational diagnostic and prognostic predictions on tabular medical data. Recently, a new Network Medicine paradigm has been introduced for patient diagnosis and prognosis using Patient Similarity Networks (PSNs), i.e. graphs where patients are represented as nodes and their clinical and biomolecular similarities as edges. In this context, graph-based algorithms, including Graph Neural Networks (GNNs), can provide predictions using not only individual patient features but also their relations within a network of clinically and biomolecularly similar individuals. In this work, we propose the first CE algorithm tailored to explain diagnostic and prognostic predictions within PSNs. Alongside a contrastive GNN backbone, we introduce a versatile, model-agnostic counterfactual search method compatible with any underlying classifier. Preliminary results on synthetic data and on a cohort of patients affected by the Alzheimers disease show that our algorithm is competitive both with seminal tabular based CE algorithms and GNNExplainer, a well-established method for explaining graph-based classification tasks.
Liu, G.; He, M.; Sun, L.; Cheng, F.; Zhang, Y.
Show abstract
Large language model (LLM) agents have automated tool use in chemistry, but orchestrating multi-step computational biology workflows--spanning structure prediction, protein design, and covalent conjugation--remains manually intensive. Here we present Open Intelligence Hub (OIH), an autonomous LLM-agent platform that dynamically plans and executes 32 containerised tools for protein binder design and antibody-drug conjugate (ADC) prioritization. OIH introduces tier-based decision routing, ipSAE-guided interface filtering, and failure-to-knowledge distillation from 265 curated cases. Across five oncology targets, the agent correctly classified all five evaluated targets and required human correction for hotspot selection in only one case, producing binders ranked by ipSAE (Nectin-4 ipTM = 0.87, HER2 ipTM = 0.85). A controlled ablation suggests that the agents PPI-informed routing yields improved downstream ipTM and ipSAE scores than epitope-guided alternatives. The LLM-agnostic architecture enables deployment with local or commercial models without pipeline changes. All results are computational predictions awaiting experimental validation.
Brown, S. M.; Cohen, A. B.; Dean, S. N.
Show abstract
Proteins are highly diverse functional polymers where the specific sequence of amino acids, selected from a standard genetically-encoded alphabet of twenty (C20), determines the structure and ultimately the function of the resulting folded protein. This standard alphabet has been identified to be non-randomly distributed in physicochemical properties crucial to both structure-formation and function, often referred to as coverage theory. While machine learning models have drastically improved protein structure prediction, protein design has yet to have similar development. Here we therefore bridge contemporary biological theory with recent advancements in artificial intelligence (AI) to develop and evaluate a generative AI protein design model, trained on hundreds of thousands of proteins within the RSCB PDB, for custom secondary structure motifs using reduced amino acid alphabets. Results indicate an overall success in designing novel proteins with desired secondary structure motifs for a broad range of amino acid alphabets. Interestingly this tool often captures the full three-dimensional tertiary structure of a target protein despite training only on physicochemical sequence space and DSSP secondary structure. The development of this model advances research across multiple disciplines, from general scientific AI/ML architecture development to protein design for biotechnology, astrobiology, and early-Earth evolutionary biology.
Ferreyra, S.; Dutra, I.; Galeano, A.; Paccanaro, A.
Show abstract
Drug-target affinity (DTA) prediction is a key task in drug discovery, enabling the estimation of the interaction strength between candidate compounds and biological targets. However, current models rely on connectivity-based molecular representations and do not explicitly account for the spatial organization, also known as stereochemistry. This limitation becomes evident when considering chirality, where a drug can exist as enantiomers, i.e., molecules that share the same atoms and bonds but differ in their three-dimensional arrangement. Despite their chemical similarity, they can interact differently with the same target, leading to variations in binding affinity and biological activity. In this paper, we propose a stereochemistry-aware DTA prediction framework that incorporates this information into molecular representations. Drug representations are learned from chemical structure using a directed-bond message passing graph neural network that captures enantiomers configurations, while protein targets are represented through sequence-based embeddings. Experiments on the Davis dataset demonstrate that our model can improve affinity prediction. Importantly, a case study on a manually curated dataset of enantiomers with different biological action shows that the model is able to distinguish the affinities in the two forms consistent with their experimentally observed biological activity. These findings support the relevance of stereochemistry-aware molecular representation for more accurate and chemically faithful DTA prediction.
Allen, T. E. H.; Bonnet, M.; Khan, R. T.
Show abstract
We introduce the Serna Bio GenAI platform, a generative chemistry and multiparametric optimization platform for the design of RNA-targeting small molecules. Targeting RNA with small molecules has proven historically challenging but offers notable potential upsides, including access to unique mechanisms of action and the ability to target otherwise untargetable genes. We consider a major challenge here to be designing chemistry specific to RNA-targeting. Molecular design is a valuable application of AI in drug discovery, but many publicly available models use training data focused on protein-targeting - the modality best historically explored in drug discovery. We showcase the difference and value in building a specifically RNA-targeting platform, comparing its performance to state-of-the-art public chemical generators and experimentally validating its chemical designs in comparison to chemistry designed by a human expert.
Wu, R.; Mao, L.; Diao, Y.; Li, H.
Show abstract
Drafting Markush claims for chemical patents remains difficult because manual claim writing is slow, error prone, and often fails to capture related chemical space in a systematic manner. We developed SpaceExpander, a computational method that converts disclosed compounds into generalized Markush claims by extracting core scaffolds, defining variable positions, decomposing complex substituents, and expanding substituent space through fragment matching. We evaluated the method on 24 publicly available chemical patents and compared its performance with IntelliPatent. SpaceExpander achieved a mean atom level scaffold accuracy of 0.92 and exactly recovered the reference scaffold in 19 of 24 patents. By contrast, IntelliPatent could process only 2 patents from the same set, indicating more limited applicability to structurally diverse cases. We further examined practical claim coverage in a case study based on the Osimertinib patent. Using representative disclosed compounds as input, SpaceExpander drafted a Markush claim that covered 5 of 7 additional approved third-generation EGFR inhibitors beyond Osimertinib. These results show that SpaceExpander is a validated method for automated Markush claim drafting and chemical space expansion.
Sun, K.; Wang, Y. E.; Purnomo, J. C.; Cavanagh, J. M.; Alteri, G. B.; Head-Gordon, T.
Show abstract
Fragment-based drug discovery (FBDD) relies heavily on the design of chemically viable linkers to connect fragments binding to different pocket regions into potent lead molecules. While recent generative models have advanced spatial fragment linking, they frequently produce linkers characterized by high torsional strain and non-drug-like motifs. In this work, we present LinkLlama, a fine-tuned Meta Llama 3 model that bridges the gap between text-based generation and 3D spatial awareness. By accepting natural language prompts that specify geometric constraints, such as distances and angles, alongside physicochemical targets like Lipinskis rules and rotatable bond limits, LinkLlama generates highly tailored molecules for the input fragments. Leveraging the inherent chemical grammar captured through supervised fine-tuning on a curated corpus of drug-like molecules from ChEMBL, the model prioritizes chemical validity without requiring complex reinforcement learning loops. Benchmarking on the ZINC and HiQBind datasets demonstrates that LinkLlama maintains competitive geometric fidelity compared to strictly 3D-aware models while achieving a two-fold increase in the proportion of chemically reasonable designs. This rising success rate, jumping from 35% to over 80%, is defined by strict adherence to comprehensive structural filters including PAINS, non-drug-like chemical patterns and complex ring systems. We further illustrate the models versatility through prospective case studies in novel small-molecule scaffold hopping and PROTAC linker design, validated via molecular docking and molecular dynamics simulations against known crystal poses. Ultimately, LinkLlama demonstrates that large language models can overcome the structural pitfalls of purely 3D-generative methods, offering a highly controllable and chemically robust framework to accelerate linker design and drug discovery in general.